Reinforcement Control via Heuristic Dynamic Programming

نویسنده

  • K. Wendy Tang
چکیده

Heuristic Dynamic Programming (HDP) is the simplest kind of Adaptive Critic which is a powerful form of reinforcement control 1]. It can be used to maximize or minimize any utility function, such as total energy or trajectory error, of a system over time in a noisy environment. Unlike supervised learning, adaptive critic design does not require the desired control signals be known. Instead, feedback is obtained based on a critic network which learns the relationship between a set of control signals and the corresponding strategic utility function. It is an approximation of dynamic programming 2]. A simple Heuristic Dynamic Programing (HDP) system involves two subnetworks, the Action network and the Critic network. Each of these networks includes a feedforward and a feedback component. A ow chart for the interaction of these components is included. To further illustrate the algorithm, we use HDP for the control of a simple, 2-D planar robot.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Dynamics Matrix of Alignment Process for a Gimbaled Inertial Navigation System Using Heuristic Dynamic Programming Method

In this paper, with the aim of estimating internal dynamics matrix of a gimbaled Inertial Navigation system (as a discrete Linear system), the discretetime Hamilton-Jacobi-Bellman (HJB) equation for optimal control has been extracted. Heuristic Dynamic Programming algorithm (HDP) for solving equation has been presented and then a neural network approximation for cost function and control input ...

متن کامل

Call Admission Control and Routing in Integrated Service Networks Using Reinforcement Learning

In integrated service communication networks, an important problem is to exercise call admission control and routing so as to optimally use the network resources. This problem is naturally formulated as a dynamic programming problem, which, however, is too complex to be solved exactly. We use methods of reinforcement learning (RL), together with a decomposition approach, to find call admission ...

متن کامل

Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks

In integrated service communication networks, an important problem is to exercise call admission control and routing so as to optimally use the network resources. This problem is naturally formulated as a dynamic programming problem, which, however, is too complex to be solved exactly. We use methods of reinforcement learning (RL), together with a decomposition approach, to find call admission ...

متن کامل

Reinforcement Learning in the brain

The modern form of RL arose historically from two separate and parallel lines of research. The first axis is mainly associated with Richard Sutton, formerly an undergraduate psychology major, and his doctoral thesis advisor, Andrew Barto, a computer scientist. Interested in artificial intelligence and ag nt-based learning and inspired by the psychological literature on Pavlovian and instrumenta...

متن کامل

Learning to control forest fires with ESP

Reinforcement Learning (Kaelbling et al., 1996) can be used to learn to control an agent by letting it interact with its environment. In general there are two kinds of reinforcement learning; (1) Value-function based reinforcement learning, which are based on the use of heuristic dynamic programming algorithms such as temporal difference learning (Sutton, 1988) and Q-learning (Watkins, 1989), a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007